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ABSTRACT 

Aims. We describe MS-MFS, a multi- scale multi-frequency deconvolution algorithm for wide-band synthesis-imaging, and present 
imaging results that illustrate the capabilities of the algorithm and the conditions under which it is feasible and gives accurate results. 
Methods. The MS-MFS algorithm models the wide-band sky-brightness distribution as a linear combination of spatial and spectral 
basis functions, and performs image-reconstruction by combining a linear-least- squares approach with iterative minimization. This 
method extends and combines the ideas used in the MS -CLEAN and MF-CLEAN algorithms for multi- scale and multi-frequency 
deconvolution respectively, and can be used in conjunction with existing wide-field imaging algorithms. We also discuss a simpler 
hybrid of spectral-line and continuum imaging methods and point out situations where it may suffice. 

Results. We show via simulations and application to multi-frequency VLA data and wideband EVLA data, that it is possible to 
reconstruct both spatial and spectral structure of compact and extended emission at the continuum sensitivity level and at the angular 
resolution allowed by the highest sampled frequency. 

Key words. Techniques :interferometric - Techniques: image processing - Methods: numerical - Radio continuum: general 



> 
in 

(N 

o 



1. Introduction 

Instruments such as the EVLA (Perley et al., 2009), ASKAP 
(Deboer et al., 2009) and LOEAR (de Vos et al., 2009) are among 
a new generation of broad-band radio interferometers, currently 
being designed and built to provide high-dynamic-range imag- 
ing capabilities. The large instantaneous band widths oflTered by 
new front-end systems increase the raw continuum sensitivity 
of these instruments and allow us to measure the spectral struc- 
ture of the incident radiation across large continuous frequency 
ranges. Until recently, the primary goal of wideband imaging 
has been to obtain a continuum image that makes use of the 
increased sensitivity and spatial-frequency coverage offered by 
combining multi-frequency measurements. So far, effects due to 
the spectral structure of the sky-brightness distribution have been 
considered mainly in the context of reducing errors in the contin- 
uum image, without paying attention to the accuracy of a spectral 
reconstruction. But now, the new bandwidths (100%) are large 
enough to allow the spectral structure to also be reconstructed to 
produce a meaningful astrophysical measurement. To do so, we 
need imaging algorithms that model and reconstruct both spatial 
and spectral structure simultaneously, and that are also sensitive 
to various eflTects of combining measurements from a large range 
of frequencies (namely varying ranges of sampled spatial scales 
and varying array-element response functions). 

The simplest method of wide-band image reconstruction is 
to image each frequency channel separately and combine the re- 
sults at the end. However, single-channel imaging is restricted 
to the narrow-band wv-coverage and sensitivity of the instru- 
ment, and source spectra can be studied only at the angular res- 
olution allowed by the lowest frequency in the sampled range. 
Also, for complicated extended emission, the single-frequency 



t/v-coverage may not be sufficient to produce a consistent solu- 
tion across frequency. While such imaging may suffice for many 
science goals, it does not take full advantage of what an instan- 
taneously wide-band instrument provides, namely the sensitivity 
and spatial-frequency coverage obtained by combining measure- 
ments from multiple receiver frequencies during image recon- 
struction. 

Multi-Erequency-Synthesis (MES) (Conway et al, 1990) is 
the technique of combining measurements at multiple discrete 
receiver frequencies during synthesis imaging. MES was ini- 
tially done to increase the aperture-plane coverage of sparse 
arrays by using narrow-band receivers and switching frequen- 
cies during the observations. Wide bandwidth systems (~ 10%) 
later presented the problem of bandwidth- smearing, which was 
eliminated by splitting the wide band into narrow-band channels 
and mapping them onto their correct spatial-frequencies during 
imaging. It was assumed that at the receiver sensitivities of the 
time, the measured sky brightness was constant across the ob- 
served bandwidth. The next step was to consider a frequency- 
dependent sky brightness distribution. Conway et al. (1990) de- 
scribe a double-deconvolution algorithm based on the instru- 
ment's responses to a series of spectral basis functions, in par- 
ticular, the first two terms of a Taylor series. A map of the av- 
erage spectral index is derived from the coefficient maps. Sault 
& Wieringa (1994) describe the ME-CLEAN algorithm which 
uses a formulation similar to double-deconvolution but calcu- 
lates Taylor-coefficients via a least- squares solution. More re- 
cently, Likhachev (2005) re-derives the least- squares method 
used in ME-CLEAN using more than two series coefficients. 

So far, these CLEAN-based multi-frequency deconvolution 
algorithms used point- source (zero- scale) flux components to 
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model the sky emission, a choice not well suited for extended 
emission. Cornwell (2008) describes the MS-CLEAN algorithm 
which does matched-filtering using templates constructed from 
the instrument response to various large scale flux components. 
Greisen et al. (2009) describe a method simular to MS-CLEAN, 
and Bhatnagar & Cornwell (2004) describe the ASP-CLEAN al- 
gorithm that explicitly fits for the parameters of Gaussian flux 
components and uses scale size to aid the separation of signal 
from noise. We show in this paper that with the MF-CLEAN 
approach, deconvolution errors that occur with a point- source 
model are enhanced in the spectral-index and spectral-curvature 
images because of error propagation eff'ects, and that the use of 
a multi- scale technique can minimize this. 

For high dynamic range imaging across wide fields-of-view, 
direction-dependent instrumental effects need to be accounted 
for. Bhatnagar et al. (2008) describe an algorithm for the cor- 
rection of time-variable and wide-field instrumental efl'ects for 
narrow-band interferometric imaging. For wide-field wide-band 
imaging, these algorithms must be extended to include the fre- 
quency dependence of the instrument. 

In this paper, we describe MS-MFS (multi- scale multi- 
frequency synthesis) as an algorithm that combines variants 
of the MF-CLEAN and MS-CLEAN approaches to simulta- 
neously reconstruct both spatial and spectral structure of the 
sky-brightness distribution. Frequency-dependent primary-beam 
correction is considered as a post-deconvolution correction 
step^ In section 5, we show imaging examples using simula- 
tions, multi-frequency VLA data and wideband EVLA data, to 
illustrate the capabilities and limits of the MS-MFS algorithm. 

1.1. Wide-Band Imaging 

We begin with a discussion of how well we can reconstruct both 
spatial and spectral information from an incomplete set of vis- 
ibilities sampled at multiple observing frequencies. An inter- 
ferometer samples the visibility function of the sky brightness 
distribution at a discrete set of spatial-frequencies (called the 
wv-coverage). The spatial-frequencies sampled at each observ- 
ing frequency v are between Wmin = ^^min and Wmax = ^^max, 
where u is used here as a generic label for the t/v-distance^ and 
b represents the length of the baseline vector (in units of me- 
ters) projected onto the plane perpendicular to the direction of 
the source. The maximum spatial-frequency measured at each 
frequency defines the angular resolution of the instrument at the 
frequency (Oy = 1 /wmax(v)). The range of spatial-frequencies be- 
tween i/min at Vmax and Wmax at Vmin represents the region where 
the visibility function is sampled at all frequencies in the band, 
and there is suflficient information to reconstruct both spatial and 
spectral structure. The spatial-frequencies outside this region are 
sampled only by a fraction of the band and the accuracy of a 
broad-band reconstruction depends on how well the spectral and 
spatial structure are constrained by an appropriate choice of flux 
model. 

Radio interferometric measurements of wideband continuum 
emission can be described as one or more of the following situ- 
ations. 



^ The integration of direction-dependent correction algorithms such 
as AW-Projection with MS-MFS will be discussed in a subsequent pa- 
per. 

^ The wv-distance is defined as Vi?Tv^ and is the radial distance of 
the spatial-frequency measured by the baseline from the origin of the 
wv-plane, in units of wavelength A. 



1. Flat Spectrum Sources : For a flat-spectrum source, mea- 
surements at multiple frequencies sample the same spatial 
structure, increasing the signal-to-noise of the measurements 
in regions of overlapping spatial-frequencies, and providing 
better overall i/v-plane filling. The angular resolution of the 
instrument is given by Wmax at Vmax- Standard deconvolution 
algorithms applied to measurements combined via MFS will 
suflfice to reconstruct source structure across the full range of 
spatial scales measured across the band. 

2. Unresolved sources with spectral structure : Consider a 
compact, unresolved source (with spectral structure) that is 
measured as a point source at all frequencies. The visibility 
function of a point source is flat across the entire spatial- 
frequency plane. Therefore, even if Wmax changes with fre- 
quency, the spectrum of the source is adequately sampled 
by the multi-frequency measurements. Using a flux model 
in which each source is a ^-function with a specific spectral 
model (for example, a smooth polynomial), it is possible to 
reconstruct the spectral structure of the source at the maxi- 
mum possible angular resolution (given by Wmax at Vmax)- 

3. Resolved sources with spectral structure : For resolved 
sources with spectral structure, the accuracy of the recon- 
struction across all spatial scales between Umm at Vmin and 
Umax at Vmax dcpcnds on an appropriate choice of flux model, 
and the constraints that it provides. For example, a source 
emitting broad-band synchrotron radiation can be described 
by a fixed brightness distribution at one frequency with a 
power-law spectrum associated with each location. Images 
can be made at the maximum angular resolution (given by 
Umax at Vmax) with the assumption that diff'erent observing 
frequencies probe the same spatial structure but measure dif- 
ferent amplitudes (usually a valid assumption). This con- 
straint is strong enough to correctly reconstruct even mod- 
erately resolved sources that are completely unresolved at 
the low end of the band but resolved at the higher end. On 
the other hand, a source whose structure itself changes by 
100% in amplitude across the band would break the above 
assumption (band-limited signals). In this case, a complete 
reconstruction would be possible only in the region of over- 
lapping spatial-frequencies (between Umin at Vmax and Umax at 
Vmin), unless the flux model includes constraints that bias the 
solution towards one appropriate for such sources. 

4. Spectral structure at large spatial scales : The lower end 
of the spatial-frequency range presents a diflferent problem. 
The size of the central hole in the t/v-coverage of a typi- 
cal interferometer increases with frequency, and spectra are 
not measured adequately at spatial scales corresponding to 
spatial-frequencies below Wmin at Vmax- In the extreme case 
where most of the visibility function lies within this wv-hole, 
a flat- spectrum large-scale source (for example) can be in- 
distinguishable from a relatively smaller source with a steep 
spectrum. Additional constraints in the form of short- spacing 
spectra may be required for an accurate reconstruction. 

5. Frequency dependence of the instrument : Array-element 
responses usually vary with frequency, direction and time. 
Standard calibration accounts for the frequency and time 
dependence for the direction in which the instrument is 
pointing. Away from the pointing-direction, the frequency- 
dependent shape of the primary-beam is the dominant re- 
maining instrumental eff'ect, and this results in artificial spec- 
tral structure in the images. To recover both spatial and spec- 
tral structure of the sky brightness across a large field of 
view, the frequency dependence of the primary beam must 
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be modeled and removed before or during multi-frequency 
synthesis imaging. 

To summarize, just as standard interferometric image re- 
construction uses a priori information about the spatial struc- 
ture of the sky to estimate the visibility function in unmeasured 
regions of the t/v-plane, multi-frequency image reconstruction 
algorithms need to use a priori information about the spectral 
structure of the sky brightness. By combining a suitable model 
with the known frequency-dependence of the spatial-frequency 
coverage and element response function, it is possible to recon- 
struct the broad-band sky brightness distribution from incom- 
plete spectral and spatial-frequency sampling. 

2. Multi-scale Multi-frequency deconvolution 

The MS-MFS algorithm described here is based on the iter- 
ative image-reconstruction framework described in Rau et al. 
(2009) and summarized in Appendix A. Sections 2.1 to 2.7 for- 
mulate the algorithm and summarize its implementation in the 
CASA package. Differences between the multi-scale and multi- 
frequency parts of MS-MFS with the original MF-CLEAN and 
MS-CLEAN approaches are highlighted in sections 3.1 and 3.2. 

2.1. Parameterization of spatial structure 

An image with multi- scale structure is written as a linear combi- 
nation of images at different spatial scales (Cornwell, 2008). 

jm^Y^lfP^lfy,^ (1) 

s=0 

where I'^ is a multi-scale model image^, and t^^'^ is a collection 
of J-f unctions that describe the locations and integrated ampli- 
tudes of flux components of scale s in the image. A^s is the num- 
ber of discrete spatial scales used to represent the image and f^^ 
is a tapered truncated parabola of width proportional to s. The 
symbol * denotes convolution. 



Here, represents an average spectral-index, and repre- 
sents spectral-curvature. The motivation behind this choice of 
interpretation is the fact that continuum synchrotron emission is 
usually modeled (and observed) as a power law distribution with 
frequency. Across the wide frequency ranges that new receivers 
are now sensitive to, spectral breaks, steepening and turnovers 
need to be factored into models, and the simplest way to include 
them and ensure smoothness, is spectral curvature^. 

A Taylor expansion of Eqn.3 yields the following expres- 
sions for the first three coeflScients from which the spectral index 

and curvature images can be computed algebraically. 

/ rsky/xsky _ x \ 
rm _ i-sky . jm _ xsky |-sky . jm _\ ^) rsky |-sky 

Note that with this choice of parameterization, we are using a 
polynomial to model a power-law, and A^t rapidly increases with 
bandwidth. A power- series expansion about and will 
yield a logarithmic expansion (i.e. / vs logy) which requires 
fewer coefficients to represent the same spectrum^. 

2.3. Multi -scale multi-frequency model 

A wideband model of the sky brightness distribution is con- 
structed from Eqns 1 and 2. A wideband flux component is a 
spatial basis function (/f^, Gaussian or parabola) whose inte- 
grated amplitude follows a Taylor polynomial in frequency. A 
region of emission in which the spectrum varies with position 
will be modeled as a sum of these wide-band flux components. 
The image-reconstruction process simultaneously solves for spa- 
tial and spectral coefficients of these flux components. 

The image at each frequency can be modeled as a linear com- 
bination of Taylor-coeflficient images at different spatial scales. 



2.2. Parameterization of spectral structure 

The spectrum of each flux component is modeled by a polyno- 
mial in frequency ( a Taylor series expansion about vq ). 

I- = Vwif^ where = (2) 
to \ ^0 / 

where if^ represents a multi- scale Taylor coefl&cient image, and 
Nt is the order of the Taylor series expansion. 

These Taylor coefficients are interpreted by choosing an 
astrophysically appropriate spectral model and performing a 
Taylor expansion to derive an expression that each coeffi- 
cient maps to. One practical choice is a power law with a 
varying index, represented by a second-order polynomial in 
log(/) vs log (^) space. 

/ S/J^+Zf log(^) 

= ^' (3) 

^ In this paper, superscripts for vectors and matrices indicate type 
(model, sky, observed, dirty, residual, etc), and subscripts in italics in- 
dicate enumeration indices (t, q for Taylor-term, s, p for spatial scale, 
y for frequency channel.). Non-italic subscripts indicate specific values 
of the enumerated indices (for example, 1^^, Iq or /«). 



Here, A^s is the number of discrete spatial scales used to represent 
the image and A^t is the order of the series expansion of the spec- 
trum. 7^,^^ represents a collection of ^-functions that describe the 

locations and integrated amplitudes of flux components of scale 
s in the image of the series coeflficient. 

2.4. Measurement equations 

The measurement equations^ for a sky brightness distribution 
parameterized by Eqn.5 are 



^ Wideband imaging algorithms described in Conway et al. (1990) 
and Sault & Wieringa (1994) use a fixed spectral index across the band, 
and handle slight curvature by performing multiple rounds of imaging 
after removing the dominant or average a at each stage. They also sug- 
gest using higher order polynomials to handle spectral curvature. 

^ Conway et al. (1990) state that the logarithmic expansion has better 
convergence properties than the linear expansion when a « 1. An even 
more compact representation is a polynomial in log / vs log y, but it be- 
comes numerically unstable to operate on logarithms and exponentials 
of pixel amplitudes, especially in the presence of noise. 

^ Appendix A contains an explanation of the matrix notation 
used here, and briefly describes standard radio-interferometric image- 
reconstruction within a least- squares model-fitting framework (mea- 
surement equations, normal equations, and iterative minimization). 
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-sky 



(6) 



t=0 s=0 



Vf'^ is a vector of n x 1 visibilities measured at frequency v. Wy 
are Taylor-weights (shown in Eqn.5). [Sy] is an n x m projection 
operator that represents the spatial-frequency sampling function 
for frequency v. The image-domain convolution of model 

with ^ is written as a spatial-frequency-domain multiplication. 

/fP * = [Ft][T,][F]/^^^ where [JsUxm = diag([F]lf^) is a 

spatial-frequency taper function. All images are vectors of shape 
m X 1. 

Eqn. 6 can be re- written to include all frequencies by stack- 
ing [Sy] for multiple frequencies. Let A^c be the number of fre- 
quencies. 



^obs ^ 22[W-fs][S][T,][F]/ 



sky 



(7) 



t=0 s=0 



where V^^^ is a vector of nNc x 1 visibilities, [S] is an nNc x m 
sampling matrix representing the multi-frequency wv-coverage 
of the synthesis array. [WJ^^^] is a diagonal ^A^c x matrix of 
weights, and consists of A^c diagonal blocks each of size nx n 
and containing w[. 

If the summations over t and s are written as a block-matrix 
dot-product, the full measurement matrix has the shape nNc x 
mNsNt. When multiplied by the set of N^Nt model sky vectors 
each of shape m x 1, it produces nNc visibilities. 
For A^t = 3,A^s = 2 the measurement equations can be written 
as follows, in block matrix form. The subscript p denotes the p^^ 
spatial scale and the subscript q denotes the q^^ Taylor coefficient 
of the spectrum polynomial. 



H H hi M h] 



rsky 
^ 



rsky 
^ 

1 

rsky 
^ 

2 

i-sky 
^ 1 



rsky 
^ 1 

1 

/Sky 
-* 1 



•bs 



(8) 



where [a.] = [Wf^][S][T^][F] 
for p e {0, A^s - 1} and q e {0,Nt - 1} 

2.5. Normal equations 

The normal equations for the system described in Eqn. 6 can be 
written in block matrix form, with each block-row (for scale size 
s, and Taylor term t) given by 

Z Z Z 1 '''' = V . 6 [0, A^s - 1], f e [0, M - 1] (9) 

Here, each [H^.p j is an m x m block of the Hessian matrix, and 
/^"^ is one of N^Ni dirty images. 



[H.l = [At][WnfA.l 



(10) 



= [FfT,F][FtS%;"f'%™Wf'SF][FfTpF] 
= [F^T.F] \^ wt^nF^SjWrS,F]| [FtT^F] 

= [F^T,F]|j]wt^«[Hv)|[F%F] 



(11) 
(12) 

(13) 
(14) 



[W™] is a diagonal matrix of data-weights (and imaging- 
weights) and [W™^**] is a diagonal matrix containing Taylor- 
weights w[. [Hy] = [F^SjW^SyF] is the Hessian matrix formed 
using only one frequency channel, and is a convolution opera- 
tor containing a shifted version of the single-frequency point- 
spread-function /P'*^ = J/ag[FtSfW™S] in each row (see ap- 
pendix A.2 for details). [F"^T^F] and [F^T^F] are also convolu- 
tion operators with f^"^ and fp"^ as their kernels. The process of 
convolution is associative and commutative, and therefore, ^ ] 
is also a convolution operator whose kernel is given by 



FPSf _ TShp 
^ s,p — s ^ 



t+q jpsf \ 



shp 



(15) 



The dirty images on the RHS of Eqn. 9 can be written as follows. 



jdirty _ 



[W^mjyobs 



= [F^T.S'^WJ^^'^^'^iy^^' 

= [F"^T,F][F"^S%^^^%^^]y"^^ 

= [F^T,F]|^wt[F^Sjwr]F^^ 



(16) 

(17) 
(18) 

(19) 
(20) 



where if"^^ = [f^ SlW^W^^' is the dirty image formed by direct 
Fourier inversion of weighted visibilities from one frequency 
channel. 

When all scales and Taylor terms are combined, the full 
Hessian matrix contains A^t^^s X NtNs blocks each of size mxm, 
and Nt Taylor coefficient images each of size m x 1, for all A^s 
spatial scales. 

The normal equations in block matrix form for the example 
in Eqn. 8 for A/t = 3,A^s = 2 is shown in Eqn. 21. The Hessian 
matrix consists of A^s x A^s = 2 x 2 blocks (the four quandrants 
of the matrix), each for one pair of spatial scale s, p (the upper 
indices). Within each quadrant, the A^t x A^t = 3x3 matrices 
correspond to various pairs of t, q (Taylor coefficient indices; the 
lower indices). This layout shows how the multi- scale and multi- 
frequency aspects of this imaging problem are combined and il- 
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lustrates the dependencies between the spatial and spectral basis 
functions. 



Ho,o Ho,o H 



0,0 

IH 0,0 H 0,0 H 0,0 H 0,1 
1,0 J L 1.1 J L 1.2 J L 1.0 

H 0,1 

2,0 



H 0,0 

L 2,1 



Hi,o 
L 0,1 



H 1,0 

0,2 



Hi 



H 0,1 

0,1 

H 0,1 



H 0,1 

2,1 



Hi 



H 0,1 

0,2 

H 0,1 



H 0,1 

2,2 



Hi,i 

0,2 



H 1,0 H 1,0 



0,0 J L 0,1 
H 1,0 I I H 1,1 1 [h 1,1 I I H 1,1 



Hi,o 

L 2,1 



Hi 



Hi 



Hi,i 

2,1 



Hi,i 

2,2 



fsky 



rsky 



i-sky 



i-sky 



I-sky 



/Sky 



|-dirty 
^ 



^dirty 
' 

2 

|-dirty 
^ 1 



wdirty 
^ 1 

1 

^dirty 



(21) 



This is the system of equations to be solved. The spatial- 
frequency sampling of a real interferometer is always incom- 
plete ([S] is rank-deficient). Therefore, each Hessian block, and 
the entire Hessian matrix is singular, and an exact inverse does 
not exist^. An accurate reconstruction can be obtained only via 
successive approximation (iterative numerical optimization). 



2.6. Principal Solution 

The principal solution^ of the normal equations is an approxi- 
mate solution that can be computed via diagonal approximations 
of all Hessian blocks. This solution is then used to pick flux com- 
ponents within the minor-cycle of iterative deconvolution. 

Each Hessian block is a convolution operator with a shifted 
version of a point- spread-function in each row (their centers 

t,q 

are aligned on the diagonal). A diagonal approximation repre- 
sents the assumption that the PSFs are ^-functions, or that the 
amplitudes in the dirty image at the location of a source reflect 
the true flux of the source. Also, with the assumption of spatially 
invariant PSFs, all elements on the diagonal within each Hessian 
block are the same. Therefore, we can reduce each [H j to one 

number. The full Hessian reduces to an A^tA^s x MA^s element ma- 
trix [HP^^^], The principal solution can be obtained by inverting 
[HP^^] once, and applying it to the dirty image vectors, one pixel 
at a time. Such a solution will be correct only at the locations of 
the centers of isolated flux-components and must be augmented 
with an iterative optimization approach to ensure accuracy. In 
the case of perfect sampling (where the Hessian blocks are truly 
diagonal and PSFs are ^-functions), the principal solution will 
directly give correct images of Taylor- series coefficients. 



^ Even if [H] were invertible, it is impractical to evaluate and invert 
the full Hessian (each row of each Hessian block represents an image). 

^ The principal solution (as defined in Bracewell & Roberts (1954) 
and used in Cornwell et al. (1999)) is a term specific to radio interferom- 
etry and represents the dirty image normalized by the sum of weights. 
It is the image formed purely from the measured data, with no contri- 
bution from the invisible distribution of images (unmeasured spatial- 
frequencies). It is also an approximate solution of the normal equations 
|-|_ljy^sky _ y^dirty ^g^^ Appendix A.2), calculated using a diagonal approx- 
imation of the Hessian. Each element on the diagonal is the peak of the 
PSF, which is also the sum of weights. For isolated sources, the values 
measured at the peaks of the principal solution (dirty) images are the 
true sky values. The CLEAN minor-cycle algorithm uses this fact to 
estimate source fluxes from the peaks of the normalized dirty image. 



2.6.1. Properties Of [Hp^^'^] 

Some properties of [HP^^^] are worth noting, to understand the 
numerical stability of this approach and its dependence on the 
choice of spectral and spatial basis functions (sky model), and 
spectral and spatial-frequency sampling functions (data and in- 
strument). 

1. There are N^Nt elements on the diagonal of [HP^^^]. Each is 
a measure of the instrument's sensitivity to a flux component 
of unit total flux whose shape and spectrum are described 
by one pair of spatial and spectral basis functions (if^^ and 
(^^) ^' ^^^^ diagonal element is given by 



mT"" = mid\l':!] = tr 



^wrr.sjwrs^T,] 



(22) 



V s e {O...Ns - 1} , t e {0...^ - 1} 



Note that the instrument's sampling function and data- 
weights are included in this expression. 

2. The ofl'-diagonal elements measure the orthogonality^ be- 
tween the various basis functions, for the given wv-coverage 
and weighting scheme. They measure the amount of over- 
lap between basis functions in the measurement domain. 
Smaller values indicate a more orthogonal set of basis func- 
tions that the instrument is better able to distinguish between. 

3. The condition number of this matrix (or of blocks within 
this matrix) will indicate if the chosen set of basis functions 
and spatial-frequency coverage provide enough constraints 
to provide a stable solution, and can be used as a metric to 
choose a suitable basis set. For a simple example, if a 3- 
term solution is attempted with data from only two distinct 
frequencies, [HP^^] will be singular. Or, for some choice 
of multi-frequency wv-coverage, the visibilities measured by 
the instrument for two different spatial scales may become 
hard to distinguish. Then, the cross-term element of [HP^^^] 
corresponding to this combination could have a higher value, 
indicating that the two parameters are highly coupled, and 
there is insufficient information in the data and sampling pat- 
tern to distinguish between the scales. A similar situation can 
arise to create ambiguity between spatial and spectral struc- 
ture (an extreme example is multi-frequency measurements 
from only one baseline). 

4. In general, [HP^^^] will be a positive-definite symmetric ma- 
trix whose inverse can be easily computed via sl Cholesky 
decomposition^^. The value of A^s is usually < 10, making 
the inversion of [HP^^^] tractable as a one-time operation. 

5. Some further approximations can be made about the struc- 
ture of [HP^^^] to simplify its inversion, and it is important 
to understand the numerical implications of these trade-offs. 
One is a block-diagonal approximation of [HP^^^] (i.e. us- 
ing only those blocks of the Hessian in Eqn. 21 for which 



^ The following definition of orthogonality is used here. Two vectors 
are orthogonal if their inner product is zero. The orthogonality of a pair 
of scale functions is measured by the integral of the product of their uv- 
taper functions. To account for wv-coverage, this integral is weighted by 
the sampling function. 

A Cholesky decomposition factors a symmetric positive-definite 
matrix into a lower triangular matrix and its conjugate transpose. It is 
used in the solution of system of equations [A]x = b where [A] is sym- 
metric positive-definite. The normal equations of a linear least-squares 
problem in which the signal is modeled as a linear combination of basis 
functions, are usually in this form (Press et al., 1988). 
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s = p\ top-left and bottom-right quadrants). This approxima- 
tion treats each spatial scale separately and assumes that the 
scale basis functions are orthogonal. For each scale, cross- 
terms between Taylor functions are preserved, and a multi- 
frequency principal solution can be obtained separately for 
each spatial scale. Note that a set of tapered truncated 
paraboloids is never orthogonal, but this separate- scale ap- 
proximation works because of the iterative -minimization 
process. This approximation makes the Hessian inversion 
easier, but to preserve accuracy, the update step of the itera- 
tive deconvolution still needs to evaluate the full LHS of the 
normal equations while subtracting out a flux component. 

2.7. MS-MFS algorithm 

This section describes an iterative process that solves the normal 
equations (Eqn.9) and produces a set of A^t Taylor-series coeffi- 
cient images at A^s different spatial scales. Appendix B lists the 
algorithm-steps in pseudo-code format, reflecting the implemen- 
tation of MS-MFS in the CASA^^ software package. 

Pre-compute Hessian : Each block of the Hessian is a convo- 
lution operator, consisting of a shifted version of the same con- 
volution kernel in each row. Therefore, it suffices to compute 
and store one kernel per Hessian block. Convolution kernels for 
all distinct blocks in the NsNt X A^s^^t Hessian are evaluated via 
Eqn. 15. All kernels are normalized by the sum-of- weights such 
that the peak of I^q^q is unity, and the relative weights between 

Hessian blocks is preserved. A block-diagonal approximation of 
[HP^^^] is done, and a set of A^s matrices each of shape NtXNt and 
denoted as [H^^^'^] are constructed. Their inverses are computed 
and stored in [H^^^'^ ]. 



Initialization : All A^s^^t model images are initialized to zero 
(or an a priori model to start from). 

Major and minor cycles : Iterative image reconstruction in 
radio interferometry is usually split into major and minor cycles 
(see Appendix A. 3). The major cycles compute the RHS of the 
normal equations, and the minor cycle inverts the Hessian and 
gets an estimate of the model. This model is used in the next 
major cycle to compute new RHS vectors from the residuals, and 
the process repeats itself until convergence is achieved. Steps 1 
and 5 (of the list of steps given below) form one major cycle, and 
repetitions of Steps 2 through 4 form the minor cycle. 

1. Compute residual images : The RHS vectors (residual or 
dirty images) /^^^ V ^ e {0, Nt-l} of the normal equations are 

computed via Eqn. 20 by first computing the multi-frequency 
dirty images and then convolving them by the scale basis 
functions. 

2. Find a Flux Component : The principal solution is com- 
puted for all pixels, one scale at a time. 



r |_| peak 1 ypix, dirty 



for each pixel, and scale ^(23) 



|-pix,psol 

Here, / 

location pix and scale s. [H^'''''^] is the block (of size 



pix,psol 



is a list of Nt Taylor-coefficients for the pixel- 
peak-. 



http://casa.nrao.edu Common Astronomy Software Applications is 
being developed at the http://www.nrao.eduNational Radio Astronomy 
Observatory 



Nt X Nt) in the list of diagonal-blocks of [Hp^^], and if""'^^^ 
is the NtX I vector constructed from /^^^^ V ^ g {0, Nt-l] for 

one pixel pix. This step is performed on all pixels, separately 
for all scales s, resulting in A^s sets of Nt Taylor-coefficient 
images. The most appropriate set of Taylor coefficients must 
now be chosen. The search is performed across pixels and 
scales. Many heuristics can be used here. For example, in 
iteration /,choose the A^t element solution-set with the dom- 
inant q = component across all scales and pixel locations. 
Or, pick the set of components that makes the largest impact 
on the value of Or, choose the location from the peak in 
the ^ = residual image, and compute the principal solution 
only for that pixel. 

The result of this step is a set of Nt model images, each con- 
taining one ^-function that marks the location of the center 
of a flux component of shape f^^^ (p represents the scale of 
the chosen component, out of all possible values of s). The 
amplitudes of these A^t ^-functions are the Taylor coefficients 
that model the spectrum of the integrated flux of this compo- 
nent. Let these A^t model images from iteration / be denoted 

as {/^^J;^€[0,M]. 

3. Update model images : A set of Nt multi-scale model im- 
ages are accumulated. 



\fqe [0,Nt] 



(24) 



where g is a loop-gain that takes on values between and 
1 and controls the step size for each iteration in the x^- 
minimization process. 
4. Update RHS : The RHS residual images^^ are updated by 
evaluating and subtracting out the entire LHS of the normal 
equations. However, since the chosen flux component cor- 
responds to just one scale, this becomes a summation over 
only Taylor terms (and not scale; there is only one scale p in 
the current model). 



jres 



M-1 

^—i L t,q 



(25) 



Repeat from Step 2 until the minor-cycle flux limit is 
reached. 

5. Predict : Model visibilities are computed from the set of 
Taylor-coefficient images via Eqn. 6. Residual visibilities are 
computed as V^^' = ^ - V^. 

Repeat from Step 1 until a global convergence criterion is 
satisfied. (After the first iteration, V^^^ is used as the new Vf*^ 
in Step 1). 



Restoration : After convergence, the model Taylor-coefficient 
images can be interpreted in different ways. 

1. The most obvious data products are the Taylor-coefficient 
images themselves, which are directly smoothed by the 
restoring beam. Residual images are added back in after 
computing the principal solution from the residuals obtained 
in the last instance of Step 1, to ensure that any undecon- 
volved flux has the correct flux values^^. 



In the first iteration, the RHS vectors are called the dirty-images. 
In all subsequent steps, the RHS vectors are formed after subtracting 
model- visibilities from the data and are called residual-images. 

As pointed out in section 2.6, this will be accurate only for isolated 
point sources that were left out of the minor cycle. 
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2. For the study of broad-band radio emission, the spectral co- 
efficients can be interpreted in terms of a power law in fre- 
quency with varying index (as described in Section 2.2). The 
data products are images of the reference-frequency flux 
the spectral-index and the spectral curvature 



Tin _ Tin / |-m 

= [/?//S^]-[OC-i)/2] 



3. 



(26) 
(27) 

(28) 



Spectral index and curvature images are calculated only in 
regions where the values in are above a chosen threshold. 
An image cube can be constructed by evaluating the spectral 
polynomial via Eqn. 2 for each frequency. This data product 
is useful for sources whose emission is not well modeled by 
a power law, but is a smooth polynomial in frequency. Band- 
limited signals that taper off smoothly in frequency are one 
example. 

When multiple sources along a given line-of- sight have dif- 
ferent spectra, the Taylor-coefficients will represent the com- 
bined spectrum. To compute spectral index and curvature 
maps for foreground sources, a polynomial background- 
subtraction must be done on the Taylor-coefficient images 
before Eqns26 to 28 are evaluated. 



3. Relation to MF-CLEAN and MS-CLEAN 

The MS-MFS algorithm is a combination of the general ideas 
used in MS-CLEAN and MF-CLEAN, but there are some subtle 
differences. The next two sections briefly discuss these differ- 
ences and their numerical implications. 

3.1. Relation to MF-CLEAN 

The MF-CLEAN algorithm (Sault & Wieringa, 1994) models 
the sky as a collection of point sources with a Taylor polynomial 
spectrum. A point-source version of MS-MFS can be derived by 
setting A^s = 1 and using if"^ = ^-function in the derivations 
in sections 2.4 and 2.5. The normal equations can be written in 
block matrix form (for example, for Nt = 3). 



[Ho,o] [Ho,i] [Ho,2] 
[Hi,o] [Hul [Hi,2] 

[H2,o] [H2,l] [H2,2] 



^0 

/Sky 
^1 




" |-dirty 
|-dirty 


L ^2 J 




^dirty 
L ^2 



(33) 



Each block [H^,^] is a convolution operator with I^^^ as its kernel 



V 



(34) 
(35) 



Primary-beam correction : For wide-field imaging, the spa- 
tial and spectral structure of the primary beam (array-element 
response function) contributes to the measured signal. If this 
instrumental effect is not accounted for, the output Taylor- 
coefficient images approximately represent the product of the 
sky and the primary-beam. 



fSky _ p Ftrue _ p i-tru 



i- 



(29) 



On the other hand, the MF-CLEAN algorithm described in Sault 
& Wieringa (1994) follows a matched-ffitering approach using 
functions called spectral-PSFs, which are equivalent to the con- 
volution kernels from the first row of Hessian blocks (q = 0) in 
Eqn.34. In MF-CLEAN, the Hessian elements and RHS vectors 
are calculated by convolving spectral-PSFs with themselves and 
the residual images. 



-1' 



2' 



q jpsf [ 



(36) 



Pvo is the primary beam at the reference frequency, and Pa and 
P(3 are spectral index and curvature due to the frequency depen- 
dence of the primary beam. 

A correction for the average primary-beam and its frequency 
dependence can be done as a post-deconvolution step. The 
primary-beams are first evaluated or measured as a function of 
frequency, and the frequency-dependence per pixel modeled by 
a power-law or a polynomial (perferably the same spectral poly- 
nomial used for the image reconstruction). Primary-beam cor- 
rection can then be done as follows. 



I, 



dirty 



rnew _ Tin / p 

Vo ~ ^Vq/^Vo 

|-new _ Tin _ p 

Fnew rm n 



(30) 
(31) 
(32) 



Note that if a polynomial is fit for the frequency-dependence of 
the primary beam, and Py^, Pa, P^ computed from it, the above 
operation is numerically identical to doing a polynomial division 
in terms of two sets of coefficients (for A^t <= 3). A brute-force 
polynomial-division using more series coefficients will yield a 
more accurate solution. Note however, that such a correction 
will not be accurate if there are time-dependent variations in 
the primary-beam, and will require integration with the AW- 
Projection algorithm discussed in Bhatnagar et al. (2008). 



(37) 



Formally, this matched filtering approach is exactly equal to 
the calculations shown in Eqns 34 and 35 only under the con- 
ditions that there is no overlap on the spatial-frequency plane 
between measurements from different observing frequencies, 
and all measurements are weighted equally across the spatial- 
frequency plane (uniform weighting). In practice, MF-CLEAN 
incurs errors for arrays with dense spatial-frequency coverage 
where tracks from different baselines and frequency-channels 
intersect. When applied to simulated EVLA data, numerical in- 
stabilities limited the fidelity of the final image, especially with 
extended emission, and this instability was eliminated by chang- 
ing the computations from Eqns 36 and 37 to Eqns 34 and 
35. Also, the MF-CLEAN formulation uses a two-term Taylor- 
polynomial, which can be shown to result in a dynamic-range 
limit of 10"^ for a bandwidth ratio of 2:1 and source spectral in- 
dex of -1.0. 

3.2. Relation to MS-CLEAN 

The MS-CLEAN algorithm (Cornwell, 2008; Greisen et al., 
2009) models the sky as a combination of multiscale flux com- 
ponents with no spectral structure. 
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A narrow-band (or flat- spectrum) version of MS-MFS can be 
derived by setting A^t = 1 in the derivations in sections 2.4 and 
2.5. The normal equations can be written in block matrix form 
(for example, for A^s =2). The peaks of the convolution kernels 
from the diagonal blocks of the Hessian are a measure of the 
sensitivity of the instrument to a particular spatial scale. 



[Ho,o] [Ho,i] 
[Hi,o] [Hi,i] 



|-dirty 



|-dirty 



(38) 



In MS-MFS, a diagonal approximation of this Hessian is 
used to compute the principal solution. This is equivalent to 
normalizing the residual images (RHS vectors) by the sum of 
weights for each spatial scale, before searching for peaks. 

In both existing forms of MS -CLEAN, this normalization is 
replaced by a scale bias, an empirical term that de-emphasises 
large spatial scales. The scale bias bs = I - 0.6 s/Sj^ax used by 
Cornwell (2008) (where ^max is the width of the largest scale ba- 
sis function) is a linear approximation of how the inverse of the 
area under each scale function changes with scale size^"^. The al- 
gorithm described by Greisen et al. (2009) uses bs ~ 1.0/^^^ 
where x e {0.2, 0.7}, to approximate a normalization by the 
area under a Gaussian, for the case when images are smoothed 
by applying a wv-taper that tends to unity for the zero spatial- 
frequency. Both these normalization schemes are approxima- 
tions of using a diagonal approximation of [HP^^^]. 

Once we have this understanding, we can see that the full 
Hessian [HP^^^] (and not just a diagonal approximation) can be 
inverted to get the normalization exactly right, giving an accu- 
rate estimate of the total flux at each scale. This becomes useful 
for sources that contain overlapping flux components of difl'erent 
spatial scales. However, this solution gives correct values only 
at the locations of the centers of the flux-components, and intro- 
duces large errors in the PSF sidelobes. Therefore, for reasons 
of stability, a diagonal approximation is still a more appropriate 
choice (demonstrated on simulated EVLA data). 

Another diff'erence lies in the minor-cycle updates. The up- 
date steps in MS-MFS and the Cornwell MS-CLEAN evaluate 
the full LHS of the normal equations to update the smoothed 
residual images and subtract out flux components within the im- 
age domain. This allows each minor cycle iteration to search 
for flux components across all spatial scales. The update step in 
the Greisen MS-CLEAN ignores the cross-terms, and performs a 
full set of minor cycle iterations on one scale at a time. A choice 
between all these methods will depend on trade-ofl's between the 
accuracy within each minor cycle iteration, the computational 
cost per step, and optimized global convergence patterns to con- 
trol the total number of iterations. 



4. Hybrids of Narrow-Band and Continuum 
Techniques 

The preceding sections discussed multi-frequency solutions that 
used data from all measured frequencies together, to take advan- 
tage of the combined spatial-frequency coverage. However, there 
are some situations where single-channel methods used in com- 
bination with multi-frequency- synthesis (and no built-in spectral 
model) will be able to deliver scientifically useful wide-band re- 
constructions at the continuum sensitivity. 



The basic idea of a hybrid wide-band method is to com- 
bine the advantages of single-channel imaging (simplicity and 
non-dependence on any spectral model) with those of continuum 
imaging (deconvolution with full continuum sensitivity). 

1. Deconvolve each channel separately upto the single-channel 
sensitivity limit (Tchan- Only sources brighter than (Tchan will 
be detected and deconvolved. 

2. Remove the contribution of bright (spectrally varying) 
sources by subtracting out visibilities predicted from the 
model image cube. At this stage, the peak residual bright- 
ness is at the level of the single-channel noise limit (Tchan- 

3. Perform MFS imaging (flat-spectrum assumption) on the 
continuum residuals to extract flux that lies between (Tchan 
and (Tcont- According to Conway et al. (1990) (and as dis- 
cussed later in section 6), errors due to this flat-spectrum 
assumption become visible only above a dynamic range of 
-1000 (for a = -0.7 and a 2:1 bandwidth ratio). Therefore, 
as long as the sensitivity improvement between a single- 
channel and the full band is less than ~1000, this second step 
will incur no errors even if the remaining flux has spectral 
structure. This requirement translates to Nchan < 10^, which 
is usually satisfied^^. 

4. Add model images from both steps, and restore the results. 
For unresolved sources, it may be appropriate to use a clean- 
beam fitted for the highest frequency, but in general, to not 
bias spectral information, all channels should be restored us- 
ing a clean beam fitted to the PSF at the lowest frequency in 
the range. 

The advantages of this approach are its simplicity, and that it 
can handle wide-band reconstructions with band-limited signals 
and spectral-lines. The disadvantages are that the angular reso- 
lution of the images and spectral information will be restricted 
to that given by the lowest frequency (a factor of two worse than 
what is possible for the 2:1 bandwidth available with the EVLA 
L-band). Also, the single-frequency spatial-frequency coverage 
may not be sufl&cient to unambiguously reconstruct all the spa- 
tial structure of interest at all frequencies, which in turn will af- 
fect spectra measured from these single-channel images. In some 
cases, additional constraints can be used. Bong et al. (2006) de- 
scribe spatio- spectral MEM, an entropy based method in which 
single-channel imaging is done along with a smoothness con- 
straint applied across frequency. 

In general, single-channel methods can be used for wide- 
band imaging mainly to construct an image of the continuum 
flux. Only if there is sufficient single-frequency wv-coverage to 
reconstruct an accurate model of the source structure (for exam- 
ple, fields of isolated point sources), may reliable spectral infor- 
mation also be derived from such an approach. 



5. IVIS-IVIFS Imaging results 

This section contains three imaging examples that demonstrate 
the MS-MFS algorithm's basic capabilities. Section 6 discusses 
various sources of error and how they manifest themselves, and 
section 7 demonstrates the limits to which the algorithm can be 
reasonably pushed. 



g When s/smax = 1.0 the bias term is 1.0 - 0.6 = 0.4 which is 
approximately equal to the inverse of the area under a Gaussian of unit 
peak and width, given by 1.0/ ^/27^ = 0.398. 



Even if visibilities are measured at a very high frequency resolu- 
tion, they can be averaged across frequency ranges upto the bandwidth- 
smearing limit for the desired image field-of-view. 
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Fig. 1. MS-MFS Imaging results using simulated EVLA data : These 
images compare truth images (left column) with the results of two wide- 
band imaging runs; multi- scale (middle column) and point- source (right 
column). The top three rows represent the first three Taylor-coefficient 
images, and the fourth and fifth rows show spectral index and spectral 
curvature respectively. 



5.1. EVLA simulation 

Data : Wide-band EVLA observations were simulated for a sky 
brightness distribution consisting of one point source with spec- 
tral index of -2.0 and two overlapping Gaussians with spectral 
indices of -1.0 and -hl.O. The spectral index across the resulting 
extended source varies smoothly between -1.0 and -\-1.0, with a 
spectral turnover in the region of overlap of the two Gaussians, 
corresponding to a spectral curvature of approximately -\-0.5. 
Data were simulated for the EVLA in C-configuration between 
1-2 GHz, for an 8-hour observation with measurements 5 min- 
utes apart. The goal of this test was to assess the ability of the 
MS-MFS algorithm to reconstruct both spatial and spectral in- 
formation accurately enough for astrophysical use. 

Imaging Results : Two MS-MFS imaging runs were done and 
the results compared. Fig.l illustrates the algorithm's perfor- 
mance by comparing several truth images describing the true 
sky brightness (left column) and reconstructions from a multi- 
scale (middle-column) and a point- source (right-column) ver- 
sion of MS-MFS. The multi-scale version used a flux model in 
which A^t = 3 and Ns = 4 with scale sizes defined by widths 
of 0, 6, 18, 24 pixels, and the point-source version used Nt = 3 
and = I with one scale function given by the ^-function 
(to emulate the MF-CLEAN algorithm). From top to bottom, 
the rows correspond to the intensity image at the reference fre- 



quency Iq = /vq, the first-order Taylor-coefificient I\ = laho, 
the second-order Taylor-coeflftcient /2 = (/a(^a - l)/2 + I^^Ivq, 
the spectral index = /i /Iq , and the spectral curvature /p = 

(/2//0)-/a(/a-l)/2. 

1. With a multi-scale multi-frequency flux model (MS-MFS) 
the spectral index across the extended source was recon- 
structed to an accuracy of Sa < 0.02 with the errors rising 
at the edges of the source where the signal-to-noise ratio de- 
creases. The spectral curvature across the extended source 
was estimated to an accuracy of SjS < 0.05 in the cen- 
tral region with the maximum error of Sfi ^ 0.2 in the re- 
gions where the curvature signal goes to zero and the source 
surface brightness is also minimum (the outer edges of the 
source). 

2. With a multi-frequency point- source model (MF-CLEAN) 
the accuracy of the spectral index and curvature maps was 
limited to Sa ^ 0.2, Sj3 ~ 0.6. This is because the use of 
a point source model will break any extended emission into 
components the size of the resolution element and this leads 
to deconvolution errors well above the ofif-source noise level. 
Error propagation during the computation of spectral index 
and curvature as ratios of these noisy Taylor-coefl&cient im- 
ages leads to high error levels in the result (see section 6.3). 
This clearly shows the importance of using a multiscale flux 
model when there is extended emission. 

3. The Gaussian on the left has a = -1.0 and ft = 0.0, and 
A/^t = 3 is not sufliicient to model the power-law accurately, 
leading to a value of J3 ^ -hO.l in the truth image as well as 
in the reconstruction. However, the Gaussian on the right has 
a = -\-l.0 and JS = 0.0 (a straight line), and A^t = 3 is more 
than sufliicient to model it, leading to an accurate value ofjS^ 
in the truth image and MS-MFS reconstruction. The point- 
source has a - -2.0, and the error on jS is proportionally 
higher. The use of A^t > 3 reduces these errors. 

5.2. Multi-frequency VLA observations of Cygnus-A 

Objective : Multi-frequency VLA observations of the bright 
radio galaxy Cygnus A were used to test the MS-MFS algo- 
rithm on real data and to test standard calibration methods on 
wide-band data. Cygnus A is an extremely bright (1000 Jy) ra- 
dio galaxy with a pair of bright compact hotspots (a = -0.5) 
located about 1 arcmin away from each other on either side of a 
very compact flat-spectrum core, and extended radio lobes asso- 
ciated with the hotspots with broad-band synchrotron emission 

at multiple spatial scales (a = -0.6 1.0) (Carilli & Barthel, 

1996). The best existing images of Cygnus-A and its spectral 
structure have been from large amounts of multi-configuration 
narrow-band VLA data (Carilli et al., 1991) designed to mea- 
sure the spatial structure as completely as possible at two widely 
separated frequencies (1.4 and 4.8 GHz). 

The goal of this test was to use multi-frequency snapshot 
observations of Cygnus A to evaluate how well the MS-MFS al- 
gorithm is able to reconstruct both spatial and spectral structure 
from measurements in which the single-frequency wv-coverage 
is insufificient to accurately reconstruct all the spatial structure at 
that frequency. 

Data were taken in April 2009 when the VLA was transi- 
tioning to the EVLA. 15 out of 27 antennas had new wideband 
EVLA feeds, but the correlator was that of the VLA (narrow- 
band). Wideband data were taken as a series of narrow-band 
snapshots spread across 8 hours and 9 distinct frequencies across 
the L-Band (30 min per frequency tuning). Flux calibration at 
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Fig. 2. Cygnus A - total intensity and spectral-index : This figure shows 
the MS-MFS total intensity map (top), and spectral-index maps ob- 
tained via three methods - MS-MFS (second row), a hybrid single-band 
method (third row), and from high-resolution full- synthesis narrow- 
band images at 1.4 and 4.8 GHz (bottom). The MS-MFS spectral index 
map has the angular-resolution of the total intensity map, and agrees 
with the values in the high-resolution comparison map (a = -0.5 at 
the hotspots increasing to or » -1.0 in the halo). However, the hybrid 
method resulted in a map with a wider range of values (positive and 
negative) and is at a much lower angular resolution. 



each frequency was done via observations of 3C286. Phase-only 
calibration was done using an existing narrow-band image of 
Cygnus A at 1 .4 GHz (Carilli et al., 1991) as a model, and further 
self-calibration was done with the wide-band flux model derived 
from the MS-MFS algorithm. The final dataset used for imaging 
consisted of 9 spectral windows each of a width of about 4 MHz 
and separated by about 100 MHz. 

These data were imaged using two methods, the MS-MFS al- 
gorithm with A^t = 3 and A^s = 10, and a hybrid of single-channel 
imaging followed by MFS on the residuals. Note that these ob- 
servations do not have dense single-frequency wv-coverage, and 
the purpose of applying the hybrid method was to emphasize the 
errors that can occur if this method is used inappropriately. Due 
to the small angular size of Cygnus-A, the eflPect of the L-band 
primary-beam was ignored in both runs. 



Results : Figure 2 shows the reconstructed total-intensity im- 
ages (top), the spectral-index map obtained via MS-MFS (sec- 
ond row) and the spectral-index map constructed from single- 
subband maps (third row). For comparison, the image at the bot- 
tom is a spectral-index map constructed from existing narrow- 
band images at 1.4 and 4.8 GHz, each constructed from a com- 
bination of VLA A, B, C and D configuration data (Carilli et al., 
1991). 

1 . The total intensity image (top row) has a peak brightness of 
77 mJy/beam at the hotspots and a peak brightness of about 
400 mJy/beam for the fainter extended parts of the halo. Both 
the methods (MS-MFS and hybrid) gave very similar total- 
intensity images and residual images. The on- source and off- 
source residuals for the MS-MFS algorithm are 30 mJy and 
25 mJy and with the hybrid algorithm are 50 mJy and 30 mJy 
respectively. 

2. The spatial structure seen in the MS-MFS spectral index im- 
age (second row) is very similar to that seen in the two-point 
(1.4 - 4.8 GHz) spectral index image (bottom). This shows 
that despite having a comparatively small amount of data (20 
VLA B -configuration snapshots at 9 frequencies) the use of 
an algorithm that models the sky brightness distribution ap- 
propriately is able to extract the same astrophysical infor- 
mation as traditional methods applied to large amounts of 
data (full synthesis runs in multiple VLA configurations at 
two frequencies). The estimated errors on the spectral index 
map are < 0.1 for the brighter regions of the source (near 
the hotspots) and > 0.2 for the fainter parts of the lobes and 
the core. In contrast, the Hybrid spectral index map (third 
row) clearly shows errors arising due to non-unique solutions 
at each separate frequency (due to insufl&cient narrow-band 
spatial-frequency coverage) as well as smoothing to the an- 
gular resolution at the lowest frequency. 

3. The MS-MFS spectral curvature map (not shown here) con- 
tains values corresponding to Aa ^ 0.4 for the brightest re- 
gions on the source. Such a change in a appears high, and we 
did not have sufl&cient single-frequency data at other bands 
to verify these values (the next section contains an example 
where we could verify the curvature maps with other data). 
Also, the values of curvature changed between imaging-runs 
with A^t = 3 and A^t = 4, suggesting over-fitting errors due to 
the low signal-to-noise ratio of the curvature measurement 
(which could arise from inaccurate bandpass calibration). 

5.3. Multi-frequency observations ofM87 

A similar observation of M87 was done with the goal of mea- 
suring the 1-2 GHz spectral index in diflferent parts of the inner 
core-jet-lobe system and outer halo filaments. 

M87 is a bright (200 Jy) radio galaxy located at the center 
of the Virgo cluster. The spatial distribution of broad-band syn- 
chrotron emission from this source consists of a bright central 
region (spanning a few arcmin) containing a flat-spectrum core, 
a jet (with known spectral index of -0.55) and two radio lobes 
with steeper spectra (-0.5 > a > -0.8) (Rottmann et al., 1996; 
Owen et al., 2000). This central region is surrounded by a large 
diflTuse radio halo (7 to 14 arcmin) with many bright narrow fil- 
aments (^ 10'' X 30. 

Multi-frequency VLA data were taken similar to the obser- 
vations of Cygnus-A, with a series of 10 snapshots at 16 differ- 
ent frequencies within the sensitivity range of the EVLA L-band 
receivers. These data were imaged using MS-MFS with Nt = 3 
and N2 = 12. The top row of images in Fig. 3 shows the intensity. 
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Fig. 3. M87 core/jet/lobe - Intensity, Spectral index and Curvature : 
These images show 3-arcsec resolution maps of the central bright re- 
gion of M87 (core+jet and inner lobes). The quantities displayed are the 
intensity at 1.5 GHz (top left), the spectral index (top middle) and the 
spectral curvature (top right). The spectral index is near zero at the core, 
varies between -0.36 and -0.6 along the jet and out into the lobes. The 
spectral curvature is on average 0.5 which translates to Aof = 0.2 across 
L-band. The plot at the bottom compares the resulting average spectrum 
(solid line) with that formed by imaging each spectral-window sepa- 
rately (dots). The dashed and dotted lines correspond to fixed values of 
spectral index (-0.43,-0.53,-0.63). 



spectral index and spectral curvature maps of the bright central 
region at an angular resolution of 3 arcsec (C+B -configuration). 

1 . The peak brightness at the center of the final restored inten- 
sity image was 15 Jy with an oflf-source RMS of 1.8 mJy and 
an on-source RMS of between 3 and 10 mJy. The spectral 
index map^^ of the bright central region (at 3 arcsec resolu- 
tion) shows a near flat- spectrum core with q^ll = -0.25, a jet 
with ofLL = -0.52 and lobes with -0.6 > ckll > -0.7. This 
bright central region had suflftcient (>100) signal-to-noise to 
be able to detect spectral curvature, and no obvious decon- 
volution errors. However the error bars on the spectral cur- 
vature are at the same level as the measurement itself, and 
a reliable estimate can only be obtained as an average over 
this entire bright region. The average curvature is measured 
to be y^LL = -0.5 which corresponds to a change in a across 
L-band by = ^ -0.2. 

2. To verify consistency of this spectral-curvature value with 
the data, each of the 16 spectral- windows was imaged sepa- 
rately, and restored with the same clean-beam. The plot at the 
bottom of Fig. 3 shows this integrated flux spectrum (log(/) 
vs log(y/yo)) as round dots, along with the average spectrum 
calculated by MS-MFS (curved line), and straight dashed 



The spectral index between two frequency bands A and B will be 
denoted as qtab- For example, the symbol ofpL corresponds to the fre- 
quency range between P-band (327 MHz) and L-band (1.4 GHz), and 
QfLL corresponds to two frequencies within L-band (here, 1.1 and 1.8 
GHz). A similar convention will be used for spectral curvature p. 



and dotted lines that correspond to constant spectral indices 
of -0.43, -0.53 and -0.63. These plots show that a change 
in a of about 0.2 is consistent with the data. Note that the 
scatter seen in the single- spectral- window data points is at 
the 1% level of the source flux. This illustrates the accuracy 
at which bandpass calibration must be done in order to mea- 
sure a physically plausible spectral-curvature signal across a 
2:1 bandwidth. 

3. These numbers were further compared with two-point spec- 
tral indices computed between 327 MHz (P-band), 1.4 
GHz (L-band), and 4.8 GHz (C-band) from existing images 
(Owen et al., 2000), (Owen, F. (private communication)). 
Across the bright central region, two-point spectral indices 
are -0.36 > ofpL > -0.45 and -0.5 > ofLC > -0.7. The mea- 
sured values from our experiment (-0.4 > q^ll > -0.7 and 
Aof ^ 0.2 across L-band) are consistent with these indepen- 
dent calculations. 



6. Sources of Error 

There are various sources of error that can aflTect the imaging 
process, and leave artifacts both on-source and oflf-source. As 
with any image-reconstruction algorithm, signs of these errors 
must be looked-for in the output images before astrophysical in- 
terpretation. 



6.1. On-source polynomial-fit errors 

The errors on the polynomial coefliicients and quantities derived 
from them will depend on the number of measurements of the 
spectrum, their distribution across a frequency range, the signal- 
to-noise ratio of the pattern being fitted, and the order of the 
polynomial used in the fit. These errors will affect regions in 
the image both on-source and oflT-source, and the resulting er- 
ror patterns and their magnitudes will depend on the available 
wv-coverage, and the choice of reference frequency. Although 
the physical parameters ly^ , and /p can be obtained from the 
first three coefliicients of a Taylor expansion of a power-law with 
varying index (Eqns.26 to 28), a higher order polynomial may 
be required during the fitting process to improve the accuracy of 
the first three coefl&cients^^. 

Figure.4 summarizes the errors obtained when the order of 
the polynomial chosen for imaging is not sufficient to model 
the power-law spectrum of the source. Data for 8 hour synthe- 
sis runa with EVLA wv-coverages were simulated for 5 diflTer- 
ent frequency ranges around 2.0 GHz. The sky brightness dis- 
tribution used for the simulation was one point source whose 
flux is 1.0 Jy and spectral index is -1.0 with no spectral curva- 
ture. The bandwidth ratios^^for these 5 datasets were 100%(3: 1), 
66%(2:1), 50%(1.67:1), 25%(1.28:1), 10%(1.1:1). 

MS-MFS was repeated on all these datasets with A^t = 1 to 
A^t = 7, and one scale A^s = 1, a ^-function. All these datasets 
were imaged using a maximum of 10 iterations, a loop-gain 
of 1.0, natural weighting and a flux threshold of 1.0 jiJy. No 



Conway et al. (1990) comment on a bias that occurs with a 2-term 
Taylor expansion, due to the use of a polynomial of insufficient order to 
model an exponential. 

There are two definitions of bandwidth ratio that are used in radio 
interferometry. One is the ratio of the highest to the lowest frequency in 
the band, and is denoted as Vhigh : viow Another definition is the ratio of 
the total bandwidth to the central frequency (vhigh - viow)/vmid expressed 
as a percentage. For example, the bandwidth ratio for viow = 1.0 GHz, 
Vhigh = 2.0 GHz is 2 : 1 and 66%. 
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Fig. 4. Peak Residuals and Errors for MFS with different values of A^t • 
These plots show the measured peak residuals (top left) and the errors 
on /vo(top right), (bottom left), and /p (bottom right) when a point- 
source of flux 1.0 Jy and a=-1.0 was imaged using Taylor polynomials 
of different orders (A^t = 1 - 7) and a linear spectral basis. The x-axis of 
all these plots show the value of A^t used for the simulation and plots for 
a and /3 begin from N[-2 and Nt = 3 respectively. An example of how 
to read these plots : For a 2:1 bandwidth ratio, a source with spectral 
index = -1.0 and Nt = 4, the achievable dynamic range (measured as 
the ratio of the peak flux to the peak residual near the source) is about 
10^, the error on the peak flux at the reference frequency is 1 part in 10^, 
and the absolute errors on a anre /3 are 10~^ and 10~^ respectively. 



noise was added to these simulations (in order to isolate and 
measure numerical errors due to the spectral fits). The top left 
panel in Fig.4 shows the peak residuals, measured over the en- 
tire 0^^ order residual image. The other three panels show on- 
source errors for ly^ (top right), (bottom left) and /p (bot- 
tom right). Errors on /vo,/a,/(3 were computed at the location 
of the point source by taking differences with the ideal values of 
/^^ = 1.0,0^ = -1.0,yS = 0.0. 

One noticeable trend from these plots is that with sufficient 
signal-to-noise in the measurements, all errors decrease expo- 
nentially (linearly in log- space) as a function of increasing or- 
der of the polynomial, and as a function of decreasing total 
bandwidth. As expected, for very narrow bandwidths, the use 
of high-order polynomials increases the error. Also, when the 
order of the polynomial used is too low, the peak residuals are 
much smaller than the on- source error incurred on ly^, la and /p. 
These trends are based on one simple example, and represent the 
best-case scenario in which all sources can be described as point 
sources. For extended emission, there can be additional errors 
due to deconvolution artifacts. In the case of very noisy spectra, 
or at a late stage of image reconstruction when the signal-to- 
noise ratio in the residual images is low, errors can arise from 
attempting to use too many terms in the polynomial fit. 

6.2. Off-source errors and dynamic-range limits 

Consider the errors on the continuum image when spectral struc- 
ture is ignored during MFS imaging (A^t = 0; flat- spectrum as- 



Fig. 5. Stokes I images of the 3C286 field, using MSMFS with M = 1 
(top left), M = 2 (top right), M = 3 (bottom left), M = 4 (bottom left). 
The peak flux of 3C286 is 14 Jy/beam. The axes labels are in units of 
arc-minutes from the pointing center and all images are shown with the 
same grey-scales. The residual RMS near 3C286 for these four images 
are 9 mJy, 1 mJy, 200 jiJy and 140 jiJy, and the RMS measured 1 degree 
away from 3C286 are 1 mJy, 200 jiJy , 85 |iJy and 80 jiJy. The thermal- 
noise limit for this dataset was 70 jiJy. 



sumption). Spectral structure will masquerade as spurious spa- 
tial structure, leading to error patterns that resemble the instru- 
ment's response to the first-order term in the Taylor-series ex- 
pansion. A rough rule of thumb for EVLA wv-coverages is that 
for a point source with spectral index a = -1.0 measured be- 
tween 1 and 2 GHz, the peak error obtained if the spectrum is 
ignored is at a dynamic range of < 10^. In other words, if the 
dynamic-range allowed by the data (peak brightness / thermal 
noise) is 800 (for example), there will be no visible artifacts if the 
spectral structure upto a = -1.0 is ignored. These numbers are 
consistent with error estimates derived in Conway et al. (1990) 
that predict errors at the level of la/X where X = 0(500) is a 
factor that depends on the wv-coverage of the instrument and the 
choice of reference frequency. Therefore, if the only goal is to 
obtain a continuum image over a narrow field-of-view, it may be 
possible to achieve the maximum-possible dynamic range by di- 
viding out an average spectral index (one single number over the 
entire sky) from the visibilities before imaging them via MFS 
with a flat- spectrum assumption (Conway et al., 1990). 

At high dynamic-ranges, off- source errors trace the spec- 
tral response functions for higher-order Taylor terms. As long 
as they are visible above the noise, they can be eliminated with 
a higher-order polynomial fit. Figure 5 shows a set of four im- 
ages of the 3C286 field made from wideband EVLA data taken 
in October 2010. These data consist of four EVLA snapshots 
spread across 90 minutes, and cover a frequency range of 1.02 
GHz to 2.1 GHz (800 MHz usable bandwidth after accounting 
for radio-frequency-interference). 3C286 is 14.4 Jy/beam at 1.5 
GHz, with a spectral index of -0.47. The four panels correspond 
to MS-MFS imaging runs with Nt = 1 (top left), Nt = 2 (top 
right), Nt = 3 (bottom left), Nt = 4 (bottom left), all with A^s = 1 
(a ^-function). All images are displayed with the same grey- scale 
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levels, and show the levels at which the error patterns appear. 
The dynamic range (calculated as the peak brightness to the peak 
residual measured near the brightest peak) ranges from 1.6 x 10^ 
when spectral structure is ignored (Nt = 1), to 1.1 X 10^ with 
A^t = 4. 

6.3. Propagation of multi-scale errors 

Deconvolution errors contribute to the on- source error in the 
Taylor coefficient images, and these errors propagate to the spec- 
tral index map which is computed as a ratio of two coefficient 
images (la = h/h)- Errors that result when a point-source flux 
model is used for extended emission can increase the error bars 
on the spectral index and curvature by an order of magnitude (as 
demonstrated by the example shown in Fig. 1). These errors are 
approximately given by 

where A/q and A/i are the absolute errors measured in the first 
two Taylor-coefficient images. For the example shown in Fig. 1, 
the errors on the coefficient images were measured as the RMS 
of the deviation from the truth images within the region of high 
signal-to-noise. These errors result in a prediction of A/^ ^0.05 
for the multi-scale version, and A/^ ~ 0.7 for the point-source 
version, which approximately matches the RMS of the deviation 
of the output a image from the truth a image. 

6.4. Frequency dependence of the Primary Beam 

When imaging wide fields-of-view, sources away from the point- 
ing center will be attenuated by the value of the primary beam 
at each frequency. The EVLA primary-beams across a 2: 1 band- 
width contribute an extra spectral-index of -1 .4 at the half-power 
point (measured from simulated beams, as well as a Gaussian 
approximation of the main lobe of the primary beam (Sault & 
Wieringa, 1994)). Therefore, even if a source has a flat spectrum, 
this artificial spectral index can result in imaging artifacts at the 
levels described in section 6 (i.e. a dynamic range limit of ~ 10^ 
for a flat spectrum source at the HPBW, due only to the spectral 
variation of the primary beam). This efl'ect can be corrected as 
a post-deconvolution step, or by using wide-field imaging algo- 
rithms along with MS-MFS. So far, accurate post-deconvolution 
corrections have been demonstrated out to the 10% level of the 
highest-frequency primary-beam. 

7. Imaging Performance (non-standard conditions) 

This section describes a set of simulations that test the limits of 
MS-MFS for different types of source structure. Three cases are 
studied; sources unresolved at the lowest sampled frequency and 
resolved at the upper end, sources whose visibility functions lie 
mostly within the central unsampled region of the wv-plane near 
the origin, and band-limited signals. 

7.1. Moderately resolved sources 

Consider a source with broad-band continuum emission and spa- 
tial structure that is unresolved at the low-frequency end of the 
band and resolved at the high-frequency end. Traditionally, spec- 
tral information would be available only at the angular resolution 
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Fig. 6. Moderately Resolved Sources - Single-Channel images : These 
figures show the 6 single-channel images generated from simulated 
EVLA data between 1 and 4 GHz in the EVLA D-configuration. The 
sky brightness consists of two point sources, each of flux 1.0 Jy at a 
reference frequency of 2.5 GHz and separated by 18 arcsec. Their spec- 
tral indices are +1.0 (top source) and -1.0 (bottom source). The angular 
resolution at 1 GHz is 60 arcsec, and at 4 GHz is 15 arcsec and the cir- 
cles in the lower left corner of each image shows the resolution element 
decreasing in size as frequency increases. 
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Fig. 7. Moderately Resolved Sources - MSMFS Images : These images 
show the intensity at 2.5 GHz (left), the spectral index showing a gradi- 
ent between - 1 and + 1 (middle) and the spectral curvature which peaks 
between the two sources and falls off" on either side (right). The angu- 
lar resolution of these images is 15 arcsec, corresponding to the highest 
frequency in the data (and Fig. 6). 

offered by the lowest observed frequency. With MS-MFS, the in- 
tensity distribution as well as the spectral index of such emission 
can be imaged at the angular resolution allowed by the highest 
frequency in the band. This is because compact emission has a 
signature all across the spatial-frequency plane and its spectrum 
is well sampled by the measurements. The highest frequencies 
constrain the spatial structure, and the flux model (in which a 
spectrum is associated with each flux component) naturally fits 
a spectrum at the angular resolution at which the spatial structure 
is modeled. Such a reconstruction is model-dependent and will 
be accurate only if the spectrum at the smallest measured scales 
can really be represented as a polynomial. 

EVLA Simulation : Wide-band EVLA data were simulated for 
the D-configuration across a frequency range of 3.0 GHz with 
6 frequency channels between 1 and 4 GHz (600 MHz apart). 
This wide frequency range was chosen to emphasize the diff'er- 
ence in angular resolution at the two ends of the band (60 arcsec 
at 1 GHz, and 15 arcsec at 4.0 GHz). The sky brightness cho- 
sen for this test consists of a pair of point sources separated by a 
distance of 1 8 arcsec (about one resolution element at the high- 
est frequency), making this a moderately resolved source. These 



13 



U.Rau and TJ.Cornwell: Multi-Scale Multi-Frequency Synthesis Imaging in Radio Interferometry 



Intensity (jy/beom) Spectral Index Residuol (Jy/beom) 




Fig. 8. Very Large Spatial Scales - Intensity, Spectral Index, Residuals : 
These images show the intensity distribution (left), spectral index (mid- 
dle) and the residuals (right) for two imaging runs. The top row shows 
results without short- spacing information, and shows a false negative 
a ^ -0.8 for the extended emission. The bottom row shows results 
with short- spacing constraints, in which the extended source has been 
reconstructed correctly. 



point sources were given different spectral indices (+1.0 for the 
top source and -1.0 for the bottom one). Figure 6 shows the six 
single-channel images of this source. At the low frequency end, 
the source is almost indistinguishable from a single flux com- 
ponent centered at the location of the bottom source whose flux 
peaks at the low-frequency end. The source structure becomes 
apparent only in the higher frequencies where the top source 
(with a positive spectral index) is brighter. 

MSMFS Imaging results : These data were imaged using the 
MS-MFS algorithm with Nt = 3 and A^s = 1 with only one spa- 
tial scale (a ^-function), and figure 7 shows the results. The in- 
tensity distribution, spectral index and curvature of this source 
were recovered at the angular resolution allowed by the 4.0 GHz 
samples (15 arcsec). This demonstrates that an appropriate flux 
model can constrain the solution accurately at the angular reso- 
lution given by the highest sampled frequency. 

7.2. Emission at large spatial scales 

Consider a very large (extended) flat-spectrum source whose vis- 
ibility function falls mainly within the central hole in the uv- 
coverage. The size of this central hole increases with observing 
frequency. The minimum spatial-frequency sampled per chan- 
nel will measure a decreasing peak flux level as frequency in- 
creases. Since the reconstruction below the minimum spatial- 
frequency involves an extrapolation of the measurements and 
is un-constrained by the data, these decreasing peak visibility 
levels can be mistakenly interpreted as a source whose ampli- 
tude itself is decreasing with frequency (a less-extended source 
with a steep spectrum). Usually, a physically-realistic flux model 
suffices as a constraint, but with the MS-MFS model (polyno- 
mial spectra associated with 2D Gaussian-like components), a 
large flat- spectrum source and a smaller steep- spectrum source 
are both allowed and considered equally probable. This creates 
an ambiguity between the reconstructed scale and spectrum that 
cannot always be resolved directly from the data, and requires 
additional information (a low-frequency narrow-band image at 
the reference frequency to constrain the spatial structure, or low- 
resolution spectral information). 




Fig. 9. Very Large Spatial Scales - Visibility plots : These plots show 
the observed (left) and reconstructed (right) visibility functions for a 
simulation in which a large extended flat- spectrum source is observed 
with an interferometer with a large central hole in its wv-coverage. The 
colours/shades in these plots represent 6 frequency channels spread be- 
tween 1 and 4 GHz. The top row of plots shows that when no short- 
spacing information was used, these data can be mistakenly fit using a 
less-extended source with a steep spectrum. The bottom row shows that 
the inclusion of short- spacing information is sufficient to reconstruct the 
sky brightness distribution correctly. 



EVLA Simulation : Wide-band EYLA data were simulated for 
the D-conffguration across a frequency range of 3.0 GHz cen- 
tred at 2.5 GHz. (6 frequency channels located 600 MHz apart 
between 1.0 and 4.0 GHz). The size of the central hole in the 
wv-coverage was increased by ff agging all baselines shorter than 
100 m and the wide frequency range was chosen to emphasize 
the diff'erence between the largest spatial scale measured at each 
frequency (0.3 kA or 10.3 arcmin at 1.0 GHz, and 1.3 k/1 or 2.5 
arcmin at 4.0 GHz). The sky brightness chosen for this test con- 
sists of one large ffat-spectrum (a = 0.0) 2D Gaussian whose 
FWHM is 2.0 arcmin (corresponding to 1.6 kA, at the reference 
frequency of 2.5 GHz), and one steep spectrum point- source 
(a=-l.O) located on top of this extended source at 30 arcsec away 
from its peak. 

MS-MFS Imaging Results : These data were imaged using 
the MS-MFS algorithm with A^t = 3 and A^s = 3 with scale 
sizes given by [0,10,30] pixels. Two imaging runs were per- 
formed with these parameters, without and with short- spacing 
information. Fig. 8 shows images of the intensity (left), spec- 
tral index (middle) and residuals (right) for these two runs, and 
Fig. 9 shows the visibility amplitudes present in the simulated 
data (left column) as well as in the reconstructed model (right 
column) at each of the 6 frequencies. In both ffgures, the top and 
bottom rows correspond to runs without and with short- spacing 
information respectively and each pair (in Fig. 8) are displayed 
at the same intensity scale. 

1. The ffrst imaging run used only baselines longer than 100m 
to simulate a large central hole in the wv-coverage. The spec- 
trum of the point-source was correctly estimated as -1.0, but 
the extended source acquired a false steep- spectrum (a ^ 
-0.8). The algorithm was able to reconstruct the correct ffux 
and spectrum of the extended source only if the multi- scale 
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basis functions were carefully chosen to match the known 
scale size (i.e. a stronger a-priori constraint). This shows 
that without additional constraints, it is not always possible 
to distinguish large flat-spectrum source from a slightly less- 
extended steep spectrum source. 

2. A second imaging run was performed including additional 
information in the form of short- spacing constraints, added 
in by retaining a small number of very short-baseline mea- 
surements at each frequency (baselines shorter than 25 m). 
The visibility plots and imaging results with this dataset 
show that the short- spacing flux estimates were sufficient to 
bias the solution towards the correct solution (flat- spectrum 
extended source). 

3. The image residuals are at the same level in both runs, 
demonstrating that in the absence of additional short-spacing 
information, both flux models are equally poorly constrained 
by the data themselves. One way to avoid this problem al- 
together (but lose some information) is to flag all spatial- 
frequencies smaller than Wmin at Vmax and not attempt to re- 
construct any spatial scales larger than what Vmax allows. 

7.3. Band-limited signals 

Consider a source of radiation where frequency traces different 
physical structures in the source (as opposed to a fixed struc- 
ture with each point emitting broad-band radiation). For such 
sources, emission may be detected in only part of the sampled 
frequency range and is for all practical purposes a band-limited 
signal. 

Since the MS-MFS algorithm uses a polynomial to model 
the spectrum of the source (and is not restricted to a power-law 
spectrum) it should be able to reconstruct such structure as long 
as it varies smoothly. However, for a band-limited signal, the 
angular resolution at which the structure can be mapped will be 
limited to the resolution of the highest frequency at which the 
signal is detected (and not the highest resolution allowed by the 
measurements). 

EVLA D-configuration data were simulated for the band- 
limited radiation observed with synchrotron emission from solar 
prominences where different frequencies probe different depths 
in the solar atmosphere. The structures are generally arch-like 
with lower frequencies sampling the top of the loop and higher 
frequencies sampling the legs. The MS-MFS algorithm was run 
on these simulated data, using A^t = 5 to fit a 4^^ -order poly- 
nomial to the source spectrum (to accomodate its nearly band- 
limited nature) and A^s = 3 with scales given by [0, 10, 30] pixels. 
Iterations were terminated after 200 iterations. 

Fig. 10 shows a comparison of the true and reconstructed 
structure at three different frequencies (1.2 GHz (left), 1.8 GHz 
(middle) and 2.6 GHz (right) ). The top row shows the true struc- 
ture, and the bottom row is the reconstruction. The 3D structure 
is mostly recovered, with the largest errors being in the central 
region where the signal spans the shortest bandwidth. The point 
source on the right edge of the loop was reconstructed at an an- 
gular resolution slightly lower than that of the highest sampled 
frequency and corresponds to the highest frequency at which 
this spot is brighter than the background emission. Fig. 11 com- 
pares the true and reconstructed spectra for three positions in the 
source (left leg (left), arch of the loop (middle), and right leg 
(right)). These spectra show the accuracy to which a 4^^ -order 
polynomial with MS-MFS was able to reconstruct the structure. 

Spectral-lines are an extreme case of a band-limited signal, 
and the use of MS-MFS imaging is restricted to obtaining a 
wide-band model of the continuum flux from line-free channels. 




Fig. 10. Band-limited Signals - Multi-frequency images : These images 
show a comparison between the true sky brightness (top row) and the 
brightness reconstructed using the MS-MFS algorithm (bottom row) at 
a set of three frequencies (1.0, 1.8, and 2.6 GHz from left to right)). 
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Fig. 11. Band-Limited Signals - Spectra across the source : These plots 
show the true (down-arrows) and reconstructed (up-arrows) spectra at 
different locations for the example discussed in this section (shown in 
Fig. 10). The left column corresponds to the left end of the loop at the 
location of the leg and shows smooth structure stretching almost all 
across the band. The middle column corresponds to the middle of the 
source where the only structure in the line-of- sight is the upper part of 
the loop (emission at a small fraction of the band). The right column 
shows spectra for the brightest point on the right end of the loop. 

to be subtracted out of the data before spectral-line strengths are 
studied. 

8. Discussion 

The introduction of broad-band receivers into radio interferom- 
etry has opened up new opportunities for the study of wide-band 
continuum emission from a vast range of astrophysical objects. 
With imaging algorithms that account for the frequency depen- 
dence of the incoming radiation as well as of the instrument, 
we can minimize imaging artifacts, achieve continuum sensitiv- 
ity and reconstruct spatial and spectral structure at the angular- 
resolution allowed by the highest observed frequency. 

The MS-MFS algorithm models the wide-band sky bright- 
ness distribution as a collection of multi-scale flux components 
whose amplitudes follow a Taylor-polynomial in frequency. The 
data products are a set of Taylor-coefficient images, which can 
either be interpreted in terms of continuum intensity, spectral 
index and curvature, or used to evaluate a spectral cube, or 
serve as a wide-band model for self-calibration or continuum- 
subtraction. For wide-field imaging, multiple pointing-centers 
(mosaicing) and the w-term are accounted for during image- 
formation, and the eff'ect of the primary beam and its frequency- 
dependence is approximately corrected in a post-deconvolution 
step. 

For point sources for which standard imaging has a dynamic- 
range limit of a few thousand, this algorithm achieves dynamic 
ranges > 10^ on test observations with the EVLA, and > 10^ 
on noise-free simulations. For sources with smooth continuum 
spectra, it is able to reconstruct spectral information at the an- 
gular resolution allowed by the combined multi-frequency uv- 
coverage. However, if the visibility functions of very large-scale 
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emission lie mostly within the central unsampled region of the 
wv-plane, the algorithm requires a-priori knowledge about the 
spectral structure at those scales. For high SNR data, error-bars 
on the spectral-index images range from < 0.02 when multi- 
scale basis functions are chosen appropriately, up to > 0.2 in the 
extreme case of modeling smooth extended emission with a set 
of ^-functions. In practice, on-source errors of Aor ^ 0.1 have 
been achieved. For low SNR extended structure (< 10), errors 
due to overfitting can dominate when a high-order polynomial is 
used. 

There are several directions in which wide-band imaging 
techniques need to be extended and improved. The imaging 
errors in MS-MFS are currently dominated by the multi- scale 
aspect of the algorithm, and methods that do adaptive fits 
(Bhatnagar & Cornwell, 2004) and use more a-priori informa- 
tion about large-scale spectra may be more appropriate. Methods 
that adaptively find the optimal number of parameters to op- 
erate with would help in error-control. Implementations of al- 
gorithms such as MS -CLEAN and MS-MFS are inefl&cient in 
memory-use, and other approaches may be required for large 
image sizes. For wide-field imaging, the time variation of the an- 
tenna primary beams must be taken into account during imaging, 
and wide-band methods combined with algorithms for direction- 
dependent corrections (A-Projection (Bhatnagar et al., 2008) or 
peeling (Smirnov, 2011)). For full-Stokes wide-band imaging, 
where a Taylor-polynomial in frequency is not the most appro- 
priate basis function to model Stokes Q,U,V emission, wide- 
band imaging with other flux models must be tried. 
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Appendix A: Matrix Notation and Frameworl^ 

The matrix notation used in this paper is explained here, in the 
context of an iterative minimization process used to solve 
a system of linear equations. The basic idea is as follows. Let 
Ax = ^ be the system of equations to be solved (measurement 
equations). The goal is to find a set of parameters x that mini- 
mizes;^^ = (Ax - b)^\N(Ax - b). Setting gradix^) = to min- 
imize leads to a new system of equations called the normal 
equations [A^WA]x = [A^]W^. The matrix on the LHS is called 
the Hessian [H] = [A^WA]. Iterations begin with an initial guess 
for the parameters x. These parameters are updated in iteration / 
as Xi+\ <— Xi + ^[H"^]A^W(Z? - AxO, where g controls the step- 
size. Iterations continue until a convergence criterion is satisfied. 
The basic iterative framework used in most imaging and decon- 
volution algorithms in radio interferometry can be described us- 
ing this matrix notation (Rau et al., 2009; Rau, 2010). 

A.1. Measurement Equations 

The measurement equation of an imaging instrument describes 
its transfer function (the eflfect of the measurement process on 
the input signal). For an ideal interferometer (a perfect spatial- 
frequency filter, with no instrumental gains), it can be written 
as follows in matrix notation as follows. Let be a pixe- 
lated image of the sky and let V^^^^ be a vector of n visibilities. 
Let S„xm be a projection operator that describes the instrument's 
sampling function (uv-coverage) as a mapping of m discrete spa- 
tial frequencies (pixels on a grid) to n visibility samples (usually 
n > m). Let F^xm be the Fourier transform operator. Then, the 
measurement equations become F^^'j = [S„xm][Fmxm]/^i. 

A.2. Normal Equations 

The normal equations are the linear system of equations whose 
solution gives a weighted least- squares estimate of a set of pa- 
rameters in a model (x^ minimization). For an ideal interfer- 
ometer, it is given by [F^S^WSFJ/^^^ = [F^S^WJF^^^ where 
\Nnxn is a diagonal matrix of weights and denotes the map- 
ping of measured visibilities onto a spatial-frequency grid. We 

can write the Normal equations as [H^^xml^^xi ~ ^mxi where the 
Hessian (matrix on the LHS) is by construction a convolution 
operator^^ with a shifted version of the point-spread function 
^mxi ~ ^/^^^^[F^S^WS] in each row. The dirty image on the RHS 
is produced by direct Fourier inversion of weighted visibilities. 
The normal equations therefore state that the dirty image is 

the result of the convolution of with /^^f . . The solution of 
these normal equations represents a deconvolution. 

The convolution of two vectors aic b is equivalent to the multipli- 
cation of their Fourier transforms. A 1-D convolution operator is con- 
structed from a and applied to b as follows. Let [A] = diag(a).Then, 
a b - [FV/ag([F]«)F]Z> - [O^b. Here, [F] is the Discrete Fourier 
Transform (DFT) operator. [C] is a Toeplitz matrix, with each row con- 
taining a shifted version of a. Multiplication of [C] with b implements 
the shift-multiply-add sequence required for the process of convolution. 
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A.3. Iterative solution 

Most existing iterative image reconstruction algorithms in radio 
interferometry consist of major and minor cycles. Major cycles 
compute the RHS of the normal equations, and minor cycles 
perform approximate (implicit) Hessian inversions to calculate 
updates for the sky model parameters I^. The first major cycle 
starts by transforming the observed visibilities into an image as 
/dirty ^ [FtStW]F°^^ and initializing the sky-model r. Minor 
cycle steps do a deconvolution to calculate updates to the model 
I^. After several iterations, this updated model is fed into the 
next major cycle. This and all subsequent major cycles calculate 
model visibilities from the current model = [SF]/'^, calcu- 
late residual visibilities V^^^ = V^^^ - V^, and transform these 
residuals into images = [F^S^W]V^^^ These residual im- 
ages replace the initial dirty image, and a new set of minor-cycle 
iterations are done. This process continues until a convergence 
criterion is reached. Usually, convergence is defined as being 
noise-like with no signal left to be extracted in the minor-cycle. 



Appendix B: MS-MFS as implemented in CASA 

The MS-MFS algorithm described in section 2.7 has been 
implemented and released via the CASA software package (ver- 
sion 3.1 onwards). More recently, it has been implemented in 
the ASKAPsoft^^ package. Algorithm 1 contains a pseudo-code 
listing. 

The main parameters that control the algorithm are 

1 . vo : a reference frequency chosen near the middle of the sam- 
pled frequency range, about which the Taylor expansion is 
performed, 

2. A^t • the number of coefficients of the Taylor polynomial to 
solve for, and 

3. A^s and /f^ : a set of scale sizes in units of image pixels to 
use for the multi- scale representation of the image. In order 
to always allow for the modeling of unresolved sources, the 
first scale function f^^ is forced to be a ^-function. 

The data products are A^t Taylor-coefficient images, a spectral 
index image, and a curvature image (if Nt > 2). This wide-band 
image model can be used within a standard self-calibration loop. 



Algorithm 1: MS-MFS, as implemented in CASA 



Data: calibrated visibilities : V^^^ Vv 

Data: wv-sampling function and weights : [Sy], [W^] 

Data: input : number of Taylor- terms number of scales A^s 

Data: input : image noise threshold, (Xthr, loop gain g 

Data: input : scale basis functions : 

Data: input : reference frequency vq to compute Wy = (^^) 
Result: model coefficient images •.If'ite {0, A^t - 1} 
Result: intensity, spectral index and curvature : /p" 



1 for ^ G {0,A^t - e [t,N, - 1} do 



/shp ^shp ^psf 
^ ^ p ^ ^tq 



2 Compute the spectral Hessian kernel /^^ 

3 for ^ G {0, A^s - e {s,N, - 1} do 
I Compute scale- spectral kernels P^l ■ 

I tq 

end 

6 end 

7 for 5 G {0,A^s - l}do 

I Construct [Hf"^] from mid(ff,) and compute [Hf"^"^ ] 
9 end 

10 Initialize the model If" for all ? G {0, A^t - 1} 



11 repeat 



/* Major Cycle */ 



for^G{0,M-l}do 

Compute I]^^ = ^yW[r^^ from residual visibilities VJf^ 
for^G{0,A^s-l}do 

I Convolve with s'^ scale-function = /f ^ * F,^' 
end 

end 

Calculate minor-cycle threshold /umit from F^^ 



repeat /* Minor Cycle */ 

for5G{0,A^s-l}do 
foreach pixel do 

Construct f^^, an A/t X 1 vector from 

IT \fte{0,Nrl} 

t 

Compute principal solution = [Hf'^'V,^' 

Fill solution if into model images : I^""^ 

t 

end 

Choose C = maxirf'K V s e {0, A^s-1}} 

t t=0 

end 

for^ G {0,M- l}do 

Update the model image : 1^ = If + g [/f^ * 

for5G{0,A^s-l}do 

Update the residual image : 



12 
13 
14 

15 
16 
17 

18 

19 
20 
21 
22 

23 

24 
25 

26 
27 
28 

29 
30 
31 

32 
33 

34 
35 
36 

37 until Peak residual in F^^ < crthr 

38 Restore the model Taylor-coefiicients /J^ G {O.A^t - 1} 

39 Calculate Co' C' ^ from IfVte {0.^ - 1} 

40 If required, remove average primary beam : 

inew Tm / n . rnew jm n . rnew jm n 



t ^ tq t 



end 

end 

until Peak residual in F^^ < /umit 



Compute model visibilities from F^ \ft G {O.M - 1} 
Compute residual visibilities = Vf' - 



ASKAPsoft is the software package being developed at the CSIRO 
for the ASKAP telescope. 
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